Skip to content

fix(lock): implement Redlock single-instance pattern in LockManagerService#537

Open
romanetar wants to merge 1 commit into
mainfrom
fix/release-on-failed-acquire-and-add-ownership-tokens
Open

fix(lock): implement Redlock single-instance pattern in LockManagerService#537
romanetar wants to merge 1 commit into
mainfrom
fix/release-on-failed-acquire-and-add-ownership-tokens

Conversation

@romanetar

@romanetar romanetar commented May 4, 2026

Copy link
Copy Markdown
Collaborator

ref https://app.clickup.com/t/86b9f3a22

Recommended actions for a follow-up ticket:

  1. Move the SendAttendeeInvitationEmail::dispatch call outside the lock callback (dispatch after the lock is released).
  2. Consider a tighter explicit lifetime (e.g. 30 s) that matches the realistic worst-case DB write time rather than the 3600 s default.
  3. Fix the missing . in the key: 'ticket_type.' . $type_id . '.promo_code.' . $promo_code_val . '.sell.lock'.

Summary by CodeRabbit

  • New Features

    • Added atomic compare-and-delete for cache entries.
    • Locks now generate per-attempt ownership tokens and only release when owned.
    • Lock acquisition retries with exponential backoff.
  • Tests

    • Added tests validating ownership semantics, that failed acquires never delete others' locks, no-op release when not acquired, token cleanup, and single-attempt lock creation.

@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR adds an atomic compare-and-delete cache method to ICacheService, implements it in Redis via a Lua eval, updates Redis conditional set usages, refactors LockManagerService to use per-call random ownership tokens with exponential backoff and ownership-aware releases, and adds tests validating ownership semantics.

Changes

Ownership-Based Distributed Locking

Layer / File(s) Summary
Interface Contract
Libs/Utils/ICacheService.php
New deleteIfValueMatches(string $key, string $expectedValue): bool method declares atomic compare-and-delete semantics for conditional key deletion.
Redis Conditional Sets
app/Services/Utils/RedisCacheService.php
incCounter and addSingleValue now use SET ... NX (with optional EX) for init/set-if-missing; TTL applied only on successful conditional set.
Atomic Redis Delete
app/Services/Utils/RedisCacheService.php
Added deleteIfValueMatches implemented with a Lua EVAL script to atomically compare the stored value and delete when equal; wrapped with existing retry logic.
LockManager State
app/Services/Utils/LockManagerService.php
Introduces private $tokens map and adjusted backoff constants/comments.
Acquire with Token & Backoff
app/Services/Utils/LockManagerService.php
acquireLock() generates a per-call random token, retries addSingleValue with exponential backoff, stores token on success, and throws UnacquiredLockException after retries.
Ownership-aware Release
app/Services/Utils/LockManagerService.php
releaseLock() checks token ownership and calls deleteIfValueMatches(name, token) only when owned, then unsets the token; otherwise it's a no-op.
Wrapper Guarding Release
app/Services/Utils/LockManagerService.php
lock() sets an $acquired flag and only calls releaseLock() in finally when acquisition succeeded.
Tests / Verification
tests/Unit/Services/LockManagerServiceOwnershipTest.php
New tests assert failed acquirers do not delete others' locks, release without acquire is a no-op, tokens are cleared after successful cycles, and addSingleValue is invoked once with token and lifetime.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LMS as LockManagerService
    participant RCS as RedisCacheService
    participant Redis

    rect rgba(100, 150, 200, 0.5)
    Note over Client,Redis: Lock Acquisition (Success Path)
    Client->>LMS: acquireLock(name, lifetime)
    LMS->>LMS: Generate token = random_bytes(16)
    loop Exponential Backoff Retry Loop
        LMS->>RCS: addSingleValue(name, token, lifetime)
        RCS->>Redis: SET name token NX EX lifetime
        Redis-->>RCS: OK (or nil)
        alt First attempt succeeds
            RCS-->>LMS: true
            LMS->>LMS: Store $tokens[name] = token
            LMS-->>Client: Acquisition complete
        else Retry needed
            RCS-->>LMS: false
            LMS->>LMS: Sleep with exponential backoff
        end
    end
    end

    rect rgba(200, 150, 100, 0.5)
    Note over Client,Redis: Lock Release (Ownership Check)
    Client->>LMS: releaseLock(name)
    LMS->>LMS: Check $tokens[name] exists?
    alt Owned by this process
        LMS->>RCS: deleteIfValueMatches(name, token)
        RCS->>Redis: EVAL Lua script (compare & delete)
        Redis->>Redis: GET name == token?
        alt Values match
            Redis->>Redis: DEL name
            Redis-->>RCS: 1 (deleted)
        else Mismatch (lock stolen)
            Redis-->>RCS: 0 (not deleted)
        end
        RCS-->>LMS: true/false
        LMS->>LMS: Unset $tokens[name]
        LMS-->>Client: Released
    else Not owned
        LMS-->>Client: Return (no-op)
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble tokens in the night,
Small bytes of luck that hold on tight,
Lua checks and backoff leaps,
No stolen keys in midnight keeps,
Owners sleep while rabbits write.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(lock): implement Redlock single-instance pattern in LockManagerService' accurately describes the main change: adding ownership tokens and atomic compare-and-delete logic to LockManagerService to implement the Redlock pattern.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/release-on-failed-acquire-and-add-ownership-tokens

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

📘 OpenAPI / Swagger preview

➡️ https://OpenStackweb.github.io/summit-api/openapi/pr-537/

This page is automatically updated on each push to this PR.

@romanetar romanetar requested a review from smarcet May 4, 2026 14:42

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/Services/Utils/LockManagerService.php (1)

62-70: 💤 Low value

Minor: Unnecessary sleep before throwing on final retry.

When attempt >= MaxRetries - 1, the code still executes usleep before throwing. This adds ~400ms of unnecessary delay on the final failed attempt.

Consider moving the retry check before the sleep:

Suggested reorder
-        $wait_interval = (int)(self::BackOffBaseInterval * (self::BackOffMultiplier ** $attempt));
-        Log::debug(sprintf("LockManagerService::acquireLock name %s retrying in %s µs (attempt %s)", $name, $wait_interval, $attempt));
-        usleep($wait_interval);
         if ($attempt >= (self::MaxRetries - 1)) {
             Log::error(sprintf("LockManagerService::acquireLock name %s lifetime %s ERROR MAX RETRIES attempt %s", $name, $lifetime, $attempt));
             throw new UnacquiredLockException(sprintf("lock name %s", $name));
         }
+        $wait_interval = (int)(self::BackOffBaseInterval * (self::BackOffMultiplier ** $attempt));
+        Log::debug(sprintf("LockManagerService::acquireLock name %s retrying in %s µs (attempt %s)", $name, $wait_interval, $attempt));
+        usleep($wait_interval);
         ++$attempt;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/Services/Utils/LockManagerService.php` around lines 62 - 70, The loop in
LockManagerService::acquireLock sleeps (usleep) even when $attempt >=
(self::MaxRetries - 1), causing an unnecessary delay before throwing
UnacquiredLockException; reorder the logic so the check for final retry (if
$attempt >= (self::MaxRetries - 1)) occurs before calling usleep and before
incrementing $attempt, log and throw immediately on final attempt, otherwise
perform the usleep, increment $attempt and continue the loop to preserve backoff
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/Services/Utils/LockManagerService.php`:
- Around line 62-70: The loop in LockManagerService::acquireLock sleeps (usleep)
even when $attempt >= (self::MaxRetries - 1), causing an unnecessary delay
before throwing UnacquiredLockException; reorder the logic so the check for
final retry (if $attempt >= (self::MaxRetries - 1)) occurs before calling usleep
and before incrementing $attempt, log and throw immediately on final attempt,
otherwise perform the usleep, increment $attempt and continue the loop to
preserve backoff behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4fb76b4-55ed-4ec3-a248-fdf0d070fc95

📥 Commits

Reviewing files that changed from the base of the PR and between c39160a and b3dbd7a.

📒 Files selected for processing (4)
  • Libs/Utils/ICacheService.php
  • app/Services/Utils/LockManagerService.php
  • app/Services/Utils/RedisCacheService.php
  • tests/Unit/Services/LockManagerServiceOwnershipTest.php

@romanetar romanetar force-pushed the fix/release-on-failed-acquire-and-add-ownership-tokens branch from b3dbd7a to acb8447 Compare May 7, 2026 17:33
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

📘 OpenAPI / Swagger preview

➡️ https://OpenStackweb.github.io/summit-api/openapi/pr-537/

This page is automatically updated on each push to this PR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/Unit/Services/LockManagerServiceOwnershipTest.php (1)

139-144: 💤 Low value

Consider asserting the token is non-empty for stronger ownership guarantee coverage.

Mockery::type('string') accepts any string including ''. A complementary assertion with Mockery::on(fn($v) => strlen($v) >= 16) (or similar) would confirm the service is actually generating a meaningful random token rather than an empty or trivial value.

♻️ Tighter token constraint
-              ->with('test.lock', Mockery::type('string'), 3600)
+              ->with('test.lock', Mockery::on(fn(string $v) => strlen($v) >= 16), 3600)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php` around lines 139 -
144, Replace the loose Mockery::type('string') expectation in
LockManagerServiceOwnershipTest (the mock of ICacheService used with
addSingleValue) with a stricter constraint that asserts the token is
non-empty/strong (e.g. Mockery::on(fn($v) => is_string($v) && strlen($v) >= 16))
or add an additional expectation using Mockery::on to verify token length,
keeping the same call to addSingleValue and the deleteIfValueMatches
expectation; target the mock for addSingleValue on the ICacheService to ensure
the generated token is meaningful rather than allowing an empty string.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php`:
- Around line 139-144: Replace the loose Mockery::type('string') expectation in
LockManagerServiceOwnershipTest (the mock of ICacheService used with
addSingleValue) with a stricter constraint that asserts the token is
non-empty/strong (e.g. Mockery::on(fn($v) => is_string($v) && strlen($v) >= 16))
or add an additional expectation using Mockery::on to verify token length,
keeping the same call to addSingleValue and the deleteIfValueMatches
expectation; target the mock for addSingleValue on the ICacheService to ensure
the generated token is meaningful rather than allowing an empty string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 757f3521-79bb-45c3-a48d-09cc2ce97243

📥 Commits

Reviewing files that changed from the base of the PR and between b3dbd7a and acb8447.

📒 Files selected for processing (4)
  • Libs/Utils/ICacheService.php
  • app/Services/Utils/LockManagerService.php
  • app/Services/Utils/RedisCacheService.php
  • tests/Unit/Services/LockManagerServiceOwnershipTest.php
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/Services/Utils/RedisCacheService.php
  • app/Services/Utils/LockManagerService.php
  • Libs/Utils/ICacheService.php

…rvice

Signed-off-by: romanetar <roman_ag@hotmail.com>
@romanetar romanetar force-pushed the fix/release-on-failed-acquire-and-add-ownership-tokens branch from acb8447 to c6e6473 Compare May 8, 2026 14:00
@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown

📘 OpenAPI / Swagger preview

➡️ https://OpenStackweb.github.io/summit-api/openapi/pr-537/

This page is automatically updated on each push to this PR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/Unit/Services/LockManagerServiceOwnershipTest.php (1)

139-145: ⚡ Quick win

Assert token identity across acquire and release, not just token type.

At Line [142] and Line [144], the test validates string token creation and release call count, but it does not verify deleteIfValueMatches receives the same token captured during addSingleValue. A token-mismatch regression could still pass.

Proposed test hardening
     public function testAddSingleValueCalledOnceWithTokenAndLifetime(): void
     {
         $cache = Mockery::mock(ICacheService::class);
+        $token = null;
         $cache->shouldReceive('addSingleValue')
               ->once()
-              ->with('test.lock', Mockery::type('string'), 3600)
+              ->with(
+                  'test.lock',
+                  Mockery::on(function ($value) use (&$token) {
+                      if (!is_string($value) || $value === '') return false;
+                      $token = $value;
+                      return true;
+                  }),
+                  3600
+              )
               ->andReturn(true);
-        $cache->shouldReceive('deleteIfValueMatches')->once()->andReturn(true);
+        $cache->shouldReceive('deleteIfValueMatches')
+              ->once()
+              ->with('test.lock', Mockery::on(fn($value) => $value === $token))
+              ->andReturn(true);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php` around lines 139 -
145, The test currently only asserts the token is a string and that
deleteIfValueMatches is called, but not that it's the same token; update
LockManagerServiceOwnershipTest to capture the token passed to
ICacheService::addSingleValue (use Mockery capture or an on/closure) and then
assert ICacheService::deleteIfValueMatches is invoked with the same captured
token (e.g., expect deleteIfValueMatches('test.lock', <capturedToken>)). Keep
addSingleValue and deleteIfValueMatches expectations tied to the captured
variable so the test fails on token mismatches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/Unit/Services/LockManagerServiceOwnershipTest.php`:
- Around line 139-145: The test currently only asserts the token is a string and
that deleteIfValueMatches is called, but not that it's the same token; update
LockManagerServiceOwnershipTest to capture the token passed to
ICacheService::addSingleValue (use Mockery capture or an on/closure) and then
assert ICacheService::deleteIfValueMatches is invoked with the same captured
token (e.g., expect deleteIfValueMatches('test.lock', <capturedToken>)). Keep
addSingleValue and deleteIfValueMatches expectations tied to the captured
variable so the test fails on token mismatches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5de5ba1-752b-4d1c-9d24-3a6a5e3e917a

📥 Commits

Reviewing files that changed from the base of the PR and between acb8447 and c6e6473.

📒 Files selected for processing (4)
  • Libs/Utils/ICacheService.php
  • app/Services/Utils/LockManagerService.php
  • app/Services/Utils/RedisCacheService.php
  • tests/Unit/Services/LockManagerServiceOwnershipTest.php
🚧 Files skipped from review as they are similar to previous changes (3)
  • Libs/Utils/ICacheService.php
  • app/Services/Utils/LockManagerService.php
  • app/Services/Utils/RedisCacheService.php

$prop->setAccessible(true);
$this->assertEmpty($prop->getValue($service));
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test: Redis unavailable during releaseLock

There is no test for the case where deleteIfValueMatches returns false (Redis connection fails). When that happens the lock is not deleted from Redis but unset($this->tokens[$name]) still runs — the local token handle is gone. The lock persists in Redis until TTL expiry with no way to retry the release from application code.

Suggested outline:

public function testReleaseLockWhenRedisDownLeavesKeyUntilTtl(): void
{
    $cache = Mockery::mock(ICacheService::class);
    $cache->shouldReceive('addSingleValue')->once()->andReturn(true);
    // simulate Redis unavailable — deleteIfValueMatches returns false
    $cache->shouldReceive('deleteIfValueMatches')->once()->andReturn(false);

    $lock = new LockManagerService($cache);
    $lock->lock('resource.lock', fn() => null);
    // Observable effect: a subsequent acquireLock on the same key will fail
    // because the key was NOT deleted — only TTL expiry can free it.
}

Documenting this as a test makes the failure mode explicit for future readers.

return $conn->set($key, $value, 'NX') !== null;
}, false);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing integration test: atomic SET NX EX

All tests for this path mock ICacheService, so the actual Redis command added here is never exercised against a real server. Two things that only an integration test can verify:

  1. Driver compatibility — the variadic form set($key, $value, 'EX', $ttl, 'NX') works with Predis 2.x. If the driver is ever switched to PhpRedis, set() returns false on an NX-miss (not null), silently breaking the !== null check.
  2. Atomicity — the fix guarantees the key cannot exist without a TTL; only a real TTL key call after addSingleValue can confirm there is no gap.

Recommend a @group integration test that:

  • calls addSingleValue($key, $token, 30) against a real test Redis
  • reads TTL $key and asserts it is between 1 and 30 s
  • calls addSingleValue again on the same key and asserts it returns false (NX semantics)

@@ -46,22 +50,24 @@ public function __construct(ICacheService $cache_service){
*/
public function acquireLock(string $name, int $lifetime = 3600):LockManagerService

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default TTL of 3600 s creates hour-long stuck locks in queue workers

This service is resolved inside long-lived queue workers (ProcessPaymentGatewayRefundJob, ProcessSummitOrderPaymentConfirmation, RevokeSummitOrder, etc.). If the worker is killed (SIGKILL, OOM, deployment restart) while holding a lock, the finally block never executes and the Redis key persists for 3600 s — blocking ticket sales on the affected key for up to one hour.

The PR description already flags this: "Consider a tighter explicit lifetime (e.g. 30 s) that matches the realistic worst-case DB write time."

Suggested fix — pass an explicit $lifetime at the call sites rather than relying on the default:

// SummitOrderService.php:951, 968, 1539
$this->lock_service->lock(
    'ticket_type.' . $ticket_type->getId() . '.sell.lock',
    function () use (...) { ... },
    30  // worst-case DB write time in seconds
);

private $cache_service;

/** @var array<string,string> lock-name → per-call ownership token */
private array $tokens = [];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable per-acquisition state on a singleton service

LockManagerService is bound as a singleton (BaseServicesProvider.php:143). Before this PR the service was stateless; this line introduces the first mutable instance state. In the current stack (PHP-FPM, synchronous queue workers) this is safe because only one coroutine runs at a time per process. However, if Laravel Octane / Swoole is ever introduced, two concurrent coroutines racing on the same key name would overwrite each other's token here — coroutine A's releaseLock would present coroutine B's token to deleteIfValueMatches, releasing B's live lock.

Suggested mitigation: either change the binding to scoped (Octane-safe per-request singleton), or remove instance state entirely by returning the token from acquireLock and accepting it as a parameter in releaseLock.

{
return $this->retryOnConnectionError(function ($conn) use ($counter_name, $ttl) {
if ($conn->setnx($counter_name, 1)) {
if ($conn->set($counter_name, 1, ['NX' => true]) !== null) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incCounter partial fix — TTL assignment still non-atomic

addSingleValue was made fully atomic (SET key value EX $ttl NX in one command), but incCounter only swaps setnx for SET NX while keeping the expire as a separate call. If the process crashes between these two commands the counter key will exist indefinitely with no TTL — the same race that addSingleValue was fixed to prevent.

Suggested fix — collapse into a single atomic command, consistent with the approach used in addSingleValue:

if ($conn->set($counter_name, 1, ['EX' => (int)$ttl, 'NX' => true]) !== null) {
    return 1;
}

@smarcet

smarcet commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Malformed lock key — missing dot separator ()

This line cannot be commented inline since SummitOrderService.php is not in the diff, but the issue is worth tracking here.

The lock key is missing a . between $type_id and 'promo_code.':

// Current — produces e.g. "ticket_type.42promo_code.SUMMER25.sell.lock"
$this->lock_service->lock('ticket_type.' . $type_id . 'promo_code.' . $promo_code_val . '.sell.lock', ...)

// Intended — "ticket_type.42.promo_code.SUMMER25.sell.lock"
$this->lock_service->lock('ticket_type.' . $type_id . '.promo_code.' . $promo_code_val . '.sell.lock', ...)

With integer type IDs there is no key collision today, but the key is semantically malformed and will confuse any tooling (monitoring, manual Redis inspection, key-expiry scripts) that parses the key pattern. There is also no test asserting the exact key string passed to ILockManagerService::lock() in this code path — a mock expectation on the key argument would catch this immediately.

The PR description already flags this as a follow-up: "Fix the missing . in the key." Recommend addressing it in a dedicated follow-up ticket before the ownership token changes are deployed, so the key format stabilises.

@smarcet smarcet left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanetar please review comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants